Stickynote signalling

yellow := all good

pink := some assistance would be greatly appreciated

What is R?

What do R-users use R for?

We’re going to try to cover everything except Model.

This image is from R for Data Science, a great text to get started with. It’s available online as a free ebook.

A fitting dataset

A fitting dataset

To familiarise ourselves with R, we will do what R users do.

We will explore a dataset.

A fitting dataset.

A few images to explain why a dataset about witch trials might be appropriate for a workshop hosted by an advocacy group for underrepresented genders.

Image search for “witch” + “michelle wolf”

Image search for “witch” + “julia gillard”

Image search for “witch” + “hillary clinton”

Questions, questions that need answering

What would you like to know?

Group discussion

What would you like to know?

Write questions on whiteboard

How to answer these questions?

For this workshop:

Why these tools?

  • R was intentionally developed to be a data analysis language (aeroplane)
  • RStudio is designed to help users use R (airport)

“The plane is pretty boring without the airport around it.”

(Tip of the hat to Julia Lowndes for the aeroplane analogy.)

Installation hell

Installation overview

  • install R (quick)
  • install RStudio (a little while)

The installation instructions adapted with appreciation from a previous workshop.

Install R on Mac or Windows

Go to the Comprehensive R Archive Network(CRAN) website.

It was first in a google search for ‘cran’ in June 2018.


For Mac users

  • Click on Download R for (Mac) OS X
  • Look at the top link under Latest release, which at time of writing is R-3.5.0.pkg, and download this if compatible with your current version mac OS (Mavericks 10.9 or higher). Otherwise download the version beneath it which is compatible for older mac OS versions.

For Windows users

  • Click on Download R for Windows
  • Then click on the link install R for the first time
  • Download from the large link at the top of the page which at time of writing is Download R 3.5.0 for Windows.

Install R

  • Then double click the downloaded R-3.5.0.pkg file and follow the prompts to install the downloaded software.

Download & Install RStudio:

Go to the RStudio website.

It was first in a google search for ‘rstudio’ in June 2018.

Choose RStudio and scroll down to the blue Download RStudio Desktop button.

Click the green button to download RStudio Desktop Open Source License and select appropriate installer for your operating system.

Double click the installer and follow the prompts to set up RStudio.

Linux

To install R:

sudo apt-get update to update first

then

sudo apt-get install r-base to install R

On most distributions:

Download the .deb file and double click to open the package installer.

Updating R and RStudio

# Check R version
version
##                _                           
## platform       x86_64-pc-linux-gnu         
## arch           x86_64                      
## os             linux-gnu                   
## system         x86_64, linux-gnu           
## status                                     
## major          3                           
## minor          4.4                         
## year           2018                        
## month          03                          
## day            15                          
## svn rev        74408                       
## language       R                           
## version.string R version 3.4.4 (2018-03-15)
## nickname       Someone to Lean On

todo: finish this section

Update R and R packages from this blogpost for Windows ??

# installing/loading the package:
if(!require(installr)) {
install.packages("installr"); require(installr)} #load / install+load installr
## Loading required package: installr
## Warning in library(package, lib.loc = lib.loc, character.only = TRUE,
## logical.return = TRUE, : there is no package called 'installr'
## Warning in install.packages :
##   unable to access index for repository https://cloud.r-project.org/src/contrib:
##   cannot open URL 'https://cloud.r-project.org/src/contrib/PACKAGES'
## Installing package into '/home/pandagrrl/R/x86_64-pc-linux-gnu-library/3.4'
## (as 'lib' is unspecified)
## Warning in install.packages :
##   unable to access index for repository https://cloud.r-project.org/src/contrib:
##   cannot open URL 'https://cloud.r-project.org/src/contrib/PACKAGES'
## Warning in install.packages :
##   package 'installr' is not available (for R version 3.4.4)
## Loading required package: installr
## Warning in library(package, lib.loc = lib.loc, character.only = TRUE,
## logical.return = TRUE, : there is no package called 'installr'
# using the package:
# updateR() # this will start the updating process of your R installation.  It will check for newer versions, and if one is available, will guide you through the decisions you'd need to make.

Why RStudio?

Working in an RStudio project has many benefits.

  • Free, open source software R in an IDE
  • Reproducible workflows

Getting started in RStudio

R-Ladies presenters gesticulate wildly at RStudio

Particularly useful panes:

  • Help
  • Console
  • Environment
  • Editor
  • Viewer

Cheat sheet

Help-Cheatsheets-RStudio IDE Cheatsheet

RStudio projects > make it straightforward to divide your work into multiple contexts, each with their own working directory, workspace, history, and source documents.

A recommendation on how to organize your R project from Good Enough Practices for Scientific Computing as summarized by Software Carpentry

  • Put each project in its own directory, which is named after the project.
  • Put text documents associated with the project in the doc directory.
  • Put raw data and metadata in the data directory, and files generated during cleanup and analysis in a results directory.
  • Put source for the project’s scripts and programs in the src directory, and programs brought in from elsewhere or compiled locally in the bin directory.
  • Name all files to reflect their content or function.

Your turn!

Now create an R Project which divides your work in - Open RStudio and create a project via File-New Project - Select New Directory and choose New Project - Name your project rcurious - Save the project directory wherever suits you

How to run code?!

Running code in the Console

The console is where you can execute single-line R commands.

The console is located, by default, in the lower left pane.

Try 3 + 2 and run using ctrl-enter or the play button:

# Annotate with comments using the #. If you precede anything with this sign, R will ignore it.
3 + 2
## [1] 5

We can annotate a script with comments using the #. If you precede anything with this sign, R will ignore it.

Installing packages

In R, the fundamental unit of shareable code is an R package.

R packages are available from The Comprehensive R Archive Network(CRAN) or github.

We’re going to use the metapackage tidyverse available from CRAN to help us with our data analysis.


For installation; i.e., first time only.

install.packages("arbitrarypkg")

For loading every time we start a new R session. Typically this is at the top of the script.

library(arbitrarypkg)

I can store the number 5 in an object x.

To assign a value we use an arrow <-.

x <- 5

What happens when you type x into the Console after assigning the value 5 to it?

What do you see in the Environment pane?

(control + 8 to switch focus to Environment pane.)


x <- 5
x
## [1] 5

<<<<<<< HEAD ## Data structures in R

Data objects can be vectors of: - numbers - characters - logical

Or tables of data.

Today we will work with a tidy data structure.

The documentation for the R package tidyverse:: is available here and github

Source

We’ll do this analysis in R markdown.

  • run code in chunks
  • write in text, LaTeX, html, outside of code chunks
  • make websites, presentations,

Rmarkdown

Intro to .Rmd

Open File-New File-R Markdown

This will open an Untitled1.Rmd template.

Importing data

Data types

  • todo extrapolate

To open a code chunk in your .Rmd: control+alt+i

To knit to .html in the Viewer pane: save and control+shift+k

Installing packages

We’re going to use the metapackage tidyverse to help us with our data analysis.

The two important functions.

For installation; i.e., first time only.

install.packages("arbitrarypkge")

For using.

library(arbitrarypkg)

Install tidyverse

We would like to install a package called “tidyverse”. Let’s try.

Load tidyverse

# I ran: install.packages("tidyverse") in the console the first time.

library(tidyverse)

Loading packages

# install.packages("tidyverse") # Run this to install.

library(tidyverse)

Import example: url

# Store a string in an object.
url <- <character string>

# Read data using a tidyverse function. 
db <- read_csv(url)
# We can also write this function with the package name, ie the read_csv function from the readr package
db <- readr::read_csv(url)

Import data example

[suggest taking this out as it is covered into rcurious-witcg-trials.csv]

Data Source

The github link in this import example is https://github.com/JakeRuss/witch-trials, first take a look at the link. Click on the data folder then trials.csv. In order to import the data we need a direct link to the data. You can find this with the button in github RAW. This is the url we want to use.

We will use a function called read_csv to load the data into a variable called witchdat

url <- "https://raw.githubusercontent.com/JakeRuss/witch-trials/master/data/trials.csv"

witchdat <- read_csv(url)
## Parsed with column specification:
## cols(
##   year = col_integer(),
##   decade = col_integer(),
##   century = col_integer(),
##   tried = col_integer(),
##   deaths = col_integer(),
##   city = col_character(),
##   gadm.adm2 = col_character(),
##   gadm.adm1 = col_character(),
##   gadm.adm0 = col_character(),
##   lon = col_double(),
##   lat = col_double(),
##   record.source = col_character()
## )

This data has been loaded into witchdat in R, take a look at the environment window.

Exploratory data analysis

The command line

todo: move this slide somewhere else, perhaps - sticking it here for now.

At some point you’ll need to use the command line.

To break ourselves in, let’s check what version of R we are running.

R --version
## R version 3.4.4 (2018-03-15) -- "Someone to Lean On"
## Copyright (C) 2018 The R Foundation for Statistical Computing
## Platform: x86_64-pc-linux-gnu (64-bit)
## 
## R is free software and comes with ABSOLUTELY NO WARRANTY.
## You are welcome to redistribute it under the terms of the
## GNU General Public License versions 2 or 3.
## For more information about these matters see
## http://www.gnu.org/licenses/.

References

References and further reading